Average word length | # of sentences | Source |
---|---|---|
13.05 | 11 | http://as.wikipedia.org/wiki/মাঘ |
13.28 | 14 | http://as.wikipedia.org/wiki/কাঠৰ_জোখ_মাপ |
13.34 | 10 | http://as.wikipedia.org/wiki/দিখৌ_নদী |
13.48 | 20 | http://as.wikipedia.org/wiki/জুল |
13.58 | 11 | http://as.wikipedia.org/wiki/বকুল |
13.69 | 22 | http://as.wikipedia.org/wiki/বিঘা |
13.79 | 10 | http://as.wikipedia.org/wiki/কাটলফিছ |
14.09 | 17 | http://as.wikipedia.org/wiki/গম |
14.11 | 16 | http://as.wikipedia.org/wiki/ঘিলা |
14.23 | 35 | http://as.wikipedia.org/wiki/বনছাই |
14.25 | 31 | http://as.wikipedia.org/wiki/ইলীহ_মাছ |
14.28 | 10 | http://as.wikipedia.org/wiki/উকাহ |
14.29 | 12 | http://as.wikipedia.org/wiki/ভাত_কেৰেলা |
14.30 | 11 | http://as.wikipedia.org/wiki/বন-ৰৌ |
14.35 | 23 | http://as.wikipedia.org/wiki/বাইট |
14.36 | 18 | http://as.wikipedia.org/wiki/নাহৰ |
14.36 | 17 | http://as.wikipedia.org/wiki/জেঠুৱা_উৎসৱ |
14.37 | 25 | http://as.wikipedia.org/wiki/ক'কে |
14.37 | 10 | http://as.wikipedia.org/wiki/কেতেকী |
14.40 | 11 | http://as.wikipedia.org/wiki/জানুৱাৰী |
14.40 | 37 | http://as.wikipedia.org/wiki/সেন_সূচক |
14.42 | 82 | http://as.wikipedia.org/wiki/অসমৰ_মাছ_ধৰা_সঁজুলি |
14.44 | 21 | http://as.wikipedia.org/wiki/টকা_(বাদ্য) |
14.44 | 27 | http://as.wikipedia.org/wiki/গমাৰী |
14.45 | 14 | http://as.wikipedia.org/wiki/মৌ-মাখি |
14.48 | 30 | http://as.wikipedia.org/wiki/ড'ৰেমন |
14.51 | 84 | http://as.wikipedia.org/wiki/অসমৰ_ধান_খেতিৰ_সঁজুলিসমূহ |
14.53 | 22 | http://as.wikipedia.org/wiki/কংগো_দেশৰ_ৰন্ধনশৈলী |
14.54 | 35 | http://as.wikipedia.org/wiki/ঋতু |
14.56 | 14 | http://as.wikipedia.org/wiki/চিক_পুই_ৰুই |
Average word length | # of sentences | Source |
---|---|---|
21.15 | 13 | http://as.wikipedia.org/wiki/ডিব্ৰুগড়_বিশ্ববিদ্যালয় |
20.76 | 13 | http://as.wikipedia.org/wiki/ভাৰতীয়_চিকিৎসা_গৱেষণা_পৰিষদ |
20.72 | 69 | http://as.wikipedia.org/wiki/পূৰ্ণতাবাদ |
20.70 | 13 | http://as.wikipedia.org/wiki/সুচিত্ৰা_চেবাষ্টিয়ান |
20.68 | 15 | http://as.wikipedia.org/wiki/গোলকীকৰণ |
20.53 | 10 | http://as.wikipedia.org/wiki/নুপিডিয়া |
20.52 | 12 | http://as.wikipedia.org/wiki/বন্দিতা_ফুকন |
20.49 | 14 | http://as.wikipedia.org/wiki/মেথড_অভিনয় |
20.46 | 13 | http://as.wikipedia.org/wiki/ডিজিটেল_মিডিয়াৰ_ব্যৱহাৰ_আৰু_মানসিক_স্বাস্থ্য |
20.36 | 12 | http://as.wikipedia.org/wiki/নিশ্চিতকৰণ_পক্ষপাতিতা |
20.33 | 10 | http://as.wikipedia.org/wiki/লাভলি_প্ৰফেচনেল_ইউনিভাৰচিটি |
20.30 | 10 | http://as.wikipedia.org/wiki/দীনেশ_চন্দ্ৰ_গোস্বামীৰ_স্বনিৰ্বাচিত_গল্প |
20.21 | 10 | http://as.wikipedia.org/wiki/গিৰিজানন্দ_চৌধুৰী_ব্যৱস্থাপনা_আৰু_প্ৰযুক্তি_প্ৰতিষ্ঠান,_তেজপুৰ |
20.18 | 15 | http://as.wikipedia.org/wiki/উইলিয়াম_জেম্ছ |
20.17 | 16 | http://as.wikipedia.org/wiki/ৰাধা_বালাকৃষ্ণন |
20.16 | 15 | http://as.wikipedia.org/wiki/ৱিকিমিডিয়া_ফাউণ্ডেশ্যন |
20.15 | 16 | http://as.wikipedia.org/wiki/সংস্কৃতি |
20.12 | 17 | http://as.wikipedia.org/wiki/কনক_ৰেলে |
20.10 | 17 | http://as.wikipedia.org/wiki/সুদেষ্ণা_সিন্হা |
20.10 | 23 | http://as.wikipedia.org/wiki/সংস্কৃতিৰ_বিকিৰণ |
20.06 | 10 | http://as.wikipedia.org/wiki/ৰেইনহাৰ্ড_গেনজেল |
20.02 | 15 | http://as.wikipedia.org/wiki/আন্তৰ্জাতিক_মানৱ_অধিকাৰ_আইন |
19.99 | 11 | http://as.wikipedia.org/wiki/প্ৰ'কোষকেন্দ্ৰীয়_জীৱ |
19.92 | 15 | http://as.wikipedia.org/wiki/ৰোদ্দম_নৰসিম্হা |
19.89 | 11 | http://as.wikipedia.org/wiki/ৰূপামঞ্জৰী_ঘোষ |
19.89 | 15 | http://as.wikipedia.org/wiki/আন্তঃৰাষ্ট্ৰীয়_ন্যায়ালয় |
19.89 | 21 | http://as.wikipedia.org/wiki/জিম_পিবলছ্ |
19.88 | 16 | http://as.wikipedia.org/wiki/অসমীয়া_সংস্কৃতিৰ_সমন্বয়_আৰু_সমাহৰণ |
19.87 | 11 | http://as.wikipedia.org/wiki/ৰবাৰ্ট_বি._ৱিলচন |
19.86 | 10 | http://as.wikipedia.org/wiki/ইলিয়াছ_আলি |
The problem addressed in this subsection (as well as the results) is similar to 6.4.1.1, but now we focus on average word length instead of average sentence length.
Measuring average word length strongly depends on tokenization. The usual tokenization might split the string “28.06.2005” into five parts “28 . 06 . 2005” of average length two. To avoid this, the number of words is counted as 1 + (number of blanks in the sentence).
select round(avg(length(sentence) / (1+ length(sentence) - length(replace(sentence," ","")))),2) as le, count(sentence) as cnt, source from sentences s, inv_so i, sources so where s.s_id=i.s_id and i.so_id=so.so_id group by source having cnt>=10 order by le limit 30;
6.4.2.2 Average logarithmic word rank for different sources
6.4.2.3 Sources consisting of many / few words with frequency 1
6.4.2.4 Sources with low / high average word length of rare words